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*0 ' Abstract 

>: 

\Q , The pressure of fundamental limits on classical computation and the promise of exponential speedups 

£ — ■ from quantum effects have recently brought quantum circuits 1101 to the attention of the Electronic De- 

sign Automation community 081 1281 171 1271 1171 . We discuss efficient quantum logic circuits which 
perform two tasks: (i) implementing generic quantum computations and (ii) initializing quantum reg- 
isters. In contrast to conventional computing, the latter task is nontrivial because the state-space of an 
/1-qubit register is not finite and contains exponential superpositions of classical bit strings. Our proposed 
circuits are asymptotically optimal for respective tasks and improve earlier published results by at least 
a factor of two. 
Oh, The circuits for generic quantum computation constructed by our algorithms are the most efficient 

known today in terms of the number of difficult gates (quantum controlled-NOTs). They are based 
on an analogue of the Shannon decomposition of Boolean functions and a new circuit block, quantum 
multiplexor, that generalizes several known constructions. A theoretical lower bound implies that our 
circuits cannot be improved by more than a factor of two. We additionally show how to accommodate 
the severe architectural limitation of using only nearest-neighbor gates that is representative of current 
implementation technologies. This increases the number of gates by almost an order of magnitude, but 
preserves the asymptotic optimality of gate counts. 
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1 Introduction 

As the ever-shrinking transistor approaches atomic proportions, Moore's law must confront the small-scale 
granularity of the world: we cannot build wires thinner than atoms. Worse still, at atomic dimensions 
we must contend with the laws of quantum mechanics. For example, suppose one bit is encoded as the 
presence or the absence of an electron in a small region. ' Since we know very precisely where the elec- 
tron is located, the Heisenberg uncertainty principle dictates that we cannot know its momentum with high 
accuracy. Without a reasonable upper bound on the electron's momentum, there is no alternative but to 
use a large potential to keep it in place, and expend significant energy during logic switching. A quanti- 
tative analysis of these phenomena leads experts from NCSU, SRC and Intel 1361 to derive fundamental 
limitations on the scalability of any computing device which moves electrons. 

Yet these same quantum effects also facilitate a radically different form of computation |13|. Theo- 
retically, quantum computers could outperform their classical counterparts when solving certain discrete 

'Most current computing technologies use electron charges to store information; exceptions include spintronics-based techniques, 
e.g., magnetic RAM. 



problems 1161 . For example, a successful large-scale implementation of Shor's integer factorization 1291 
would compromise the RSA cryptosystem used in electronic commerce. On the other hand, quantum ef- 
fects may also be exploited for public-key cryptography (4] . Indeed, such cryptography systems, based 
on single-photon communication, are commercially available from MagiQ Technologies in the U.S. and 
IdQuantique in Europe. 

Physically, a quantum bit might be stored in one of a variety of quantum-mechanical systems. A broad 
survey of these implementation technologies, with feasibility estimates and forecasts, is available in the 
form of the ARDA quantum computing roadmap IQ. Sample carriers of quantum information include 
top-electrons in hyperfine energy levels of either trapped atoms or trapped ions, tunneling currents in 
cold superconductors, nuclear spin polarizations in nuclear magnetic resonance, and polarization states of 
single photons. A collection of n such systems would comprise an n-qubit register, and quantum logic gates 
(controlled quantum processes) would then be applied to the register to perform a computation. In practice, 
such gates might result from rotating the electron between hyperfine levels by shining a laser beam on the 
trapped atom/ion, tuning the tunneling potential by changing voltages and/or current in a super-conducting 
circuit, or perhaps passing multiple photons through very efficient nonlinear optical media. 

The logical properties of qubits also differ significantly from those of classical bits. Bits and their 
manipulation can be described using two constants (0 and 1) and the tools of boolean algebra. Qubits, on 
the other hand, must be discussed in terms of vectors, matrices, and other linear algebraic constructions. 
We will fully specify the formalism in Section[2] but give a rough idea of the similarities and differences 
between classical and quantum information below. 

1 . A readout (observation, measurement) of a quantum register results in a classical bit-string. 

2. However, identically prepared quantum states may yield different classical bit-strings upon obser- 
vation. Quantum physics only predicts the probability of each possible readout, and the readout 
probabilities of different bits in the register need not be independent. 

3. After readout, the state "collapses" onto the classical bit string observed. All other quantum data is 
lost. 

These differences notwithstanding, quantum logic circuits, from a high level perspective, exhibit many 
similarities with their classical counterparts. They consist of quantum gates, connected (though without 
fanout or feedback) by quantum wires which carry quantum bits. Moreover, logic synthesis for quantum 
circuits is as important as for the classical case. In current implementation technologies, gates that act 
on three or more qubits are prohibitively difficult to implement directly. Thus, implementing a quantum 
computation as a sequence of two-qubit gates is of crucial importance. Two-qubit gates may in turn be 
decomposed into circuits containing one-qubit gates and a standard two-qubit gate, usually the quantum 
controlled-not (CNOT). These decompositions are done by hand for published quantum algorithms (e.g., 
Shor's factorization algorithm (29 1 or Graver's quantum search 11161 1 but have long been known to be 
possible for arbitrary quantum functions 1121 l3l. While CNOTs are used in an overwhelming majority of 
theoretical and practical work in quantum circuits, their implementations are orders of magnitude more 
error-prone than implementations of single-qubit gates and have longer durations. Therefore, the cost of 
a quantum circuit can be realistically calculated by counting CNOT gates. Moreover, it has been shown 
previously that if CNOT is the only two-qubit gate type used, the number of such gates in a sufficiently 
large irredundant circuit is lower-bounded by approximately 20% 1271 . 

The first quantum logic synthesis algorithm to so decompose an arbitrary n-qubit gate would return a 
circuit containing (9(« 3 4") CNOT gates Q. The work in (9) interprets this algorithm as the QR decomposi- 
tion, well-known in matrix algebra. Improvements on this method have used clever circuit transformations 
and/or Gray codes l20l l2l l3TI to lower this gate count. More recently, different techniques 1211 have led to 



circuits with CNOT-counts of 4" — 2" + ' . The exponential gate count is not unexpected: just as the exponen- 
tial number of n-bit Boolean functions ensures that the circuits computing them are generically large, so too 
in the quantum case. Indeed, it has been shown that n-qubit operators generically require |~| (4" — 3« — 1 )] 
CNOTs [27 1 . Similar exponential lower bounds existed earlier in other gate libraries 1201 . 

Existing algorithms for n-qubit circuit synthesis remain a factor of four away from lower bounds and 
fare poorly for small n. These algorithms require at least 8 CNOT gates for n = 2, while three CNOT gates 
are necessary and sufficient in the worst case 1271 1341 l33l . Further, a simple procedure exists to produce 
two-qubit circuits with minimal possible number of CNOT gates 1251 . In contrast, in three qubits the lower 
bound is 14 while the generic n-qubit decomposition of 1211 achieves 48 CNOTs and a specialty 3-qubit 
circuit of 1321 achieves 40. 

In this work, we focus on identifying useful quantum circuit blocks. To this end, we analyze quantum 
conditionals and define quantum multiplexors that generalize CNOT, Toffoli and Fredkin gates. Such 
quantum multiplexors implement if-then-else conditionals when the controlling predicate evaluates to a 
coherent superposition of |0) and |1). We find that quantum multiplexors prove amenable to recursive 
decomposition and vastly simplify the discussion of many results in quantum logic synthesis (cf. JS] l3l1 
1211 1. Ultimately, our analysis leads to a quantum analogue of the Shannon decomposition, which we apply 
to the problem of quantum logic synthesis. 

We contribute the following key results. 

• An arbitrary «-qubit quantum state can be prepared by a circuit containing no more than 2" +1 — 2« 
CNOT gates. This lies a factor of four away from the theoretical lower bound. 

• An arbitrary n-qubit operator can be implemented in a circuit containing no more than (23/48) x 
4" — (3/2) x 2" + 4/3 CNOT gates. This improves upon the best previously published work by a 
factor of two and lies less than a factor of two away from the theoretical lower bound. 

• In the special case of three qubits, our technique yields a circuit with 20 CNOT gates, whereas the 
best previously known result was 40. 

• The architectural limitation of permitting only nearest-neighbor interactions, common to physical 
implementations, does not change the asymptotic behavior of our techniques. 

In addition to these technical advances, we develop a theory of quantum multiplexors that parallels 
well-known concepts in digital logic, such as Shannon decomposition of Boolean functions. This new 
theory produces short and intuitive proofs of many results for n-qubit circuits known today. 

The remainder of the paper is organized as follows. In [J2] we define quantum bits, quantum logic, 
and quantum circuits, and we introduce the necessary mathematical formalism for manipulating them. 
In 2] we introduce a novel circuit block, the quantum multiplexor, which immediately allows radical 
notational simplifications of the statements and proofs of previously known results. In [J4] we give a 
novel, asymptotically-optimal algorithm for register initialization and indicate its applications to more 
general problems in quantum logic synthesis. In [|5] we use the Cosine-Sine decomposition, along with a 
novel decomposition of single-select-bit quantum multiplexors, to derive a functional decomposition for 
quantum logic that can be applied recursively. We obtain quantum circuits to simulate any unitary operator 
(quantum evolution) U and present competitive gate counts. In Sj6] we show that our techniques adapt well 
to severe implementation constraints representative of many quantum-circuit technologies. Our results are 
summarized in f7] which concludes the paper. Additionally, two highly-technical aspects of our work 
required to achieve the best gate counts are described in the Appendix. 



2 Background and Notation 

The notion of a qubit formalizes the logical properties of an ideal quantum-mechanical system with two 
basis states. The two states are labeled |0) and |1). They can be distinguished by quantum measurement of 
the qubit, which yields a single classical bit of information, specifying which state the qubit was observed 
in. However, the state of an isolated (in particular, unobserved) qubit must be modeled by vector in a 
two-dimensional complex 2 vector space 9<\ which is spanned by the basis states. 

y^spancdO),!!)} (1) 

We identify |0) and 1 1) with the following column vectors. 

W-(J) !.)-(!) 

Thus, an arbitrary state |(j)) S tt\ can be written in either of the two equivalent forms given below. 

|(»=a |0)+a 1 |l> = ( ^ ) (3) 

The entries of the state vector determine the readout probabilities: if we measure a qubit whose state is 
described by |(|)}, we should expect to see |0) with probability oto | 2 and |1) with probability |oci | 2 . Since 
these are the only two possibilities, oco and (Xi are required to satisfy |ao| 2 + |oci | 2 = 1. 

2.1 Qubit Registers 

By a register ofqubits, we shall simply mean a logical qubit array with a fixed number of qubits in a fixed 
order. A readout of a qubit register amounts to readouts of each component qubit; thus a readout of an 
n-qubit register might take the form \bo) \b\) ■■■ \b n -\) for each bj <E {0, 1}. We shall abbreviate this to 
\bobi . . .bn-i), and call it a bitstring state. Just as for a single qubit, the state of an isolated qubit register 
is modeled by a vector in the complex vector space spanned by the bitstring states. 

9< n = span c { \b) ; b a bitstring of length n} (4) 

Writing B" for the set of length nbitstrings, an arbitrary vector |\|/) G H n may be expressed as Y,bew a b \b) 
or as the column vector whose b-Xh entry is a/,. As for a single qubit, |oc/,| 2 represents the probability that 
a readout of |\|/} yields the bitstring \b); thus the a/, are subject to the relation Y,b \ a h\ 2 = 1- 

Suppose we concatenate a ^-qubit register L and an m-qubit register M to form an £ + m = n-qubit 
register N. Assuming L and M have not previously interacted (and remain independent), we may describe 
them by state vectors |\|l/,) g !H( and |\|%) e !K m . 

Wl)= Y,^\b) \y M )= E y>»\b') (5) 

b'ew 



To describe the state of N, we must somehow obtain from |\|//,} and \^m) a state vector |\|/#) e 9l n , Quan- 
tum mechanics demands that we use a natural generalization of bitstring concatenation called the tensor 
product. To compute the tensor product of two states, we write |\|/#) = |\|/i) |\|/m), and expand it using the 
distributive law. 

|¥l)|¥m)= E P^l^l^') (6) 

del' 



2 Complex rather than real coefficients are required in most applications. For example, in certain optical implementations 1221 
§7.4.2] real and imaginary parts encode both the presence and phase of a photon. 



Let • denote concatenation; then \b } \b') and \b ■ b') represent the same bitstring state. As b ■ b' € B", 
we have |\|/l) |\|/m) G #«, as desired. 

Perhaps counter-intuitively, the quantum-mechanical state of N cannot in general be specified only in 
terms of the states of L and M. Indeed, ^4 is a 2 k dimensional vector space, and for n>2we observe 2" ^> 
2 m + 2 e . For example, three independent qubits can be described by three two-dimensional vectors, while 
a generic state-vector of a three-qubit system is eight-dimensional. Much interest in quantum computing 
is driven by this exponential scaling of the state space, and the loss of independence between different 
subsystems is called quantum entanglement. 

2.2 Quantum Logic Gates 

By a quantum logic gate, we shall mean a closed-system evolution (transformation) of the n-qubit state 
space tt n . In particular, this means that no information is gained or lost during this evolution, thus a 
quantum gate has the same number of input qubits as output qubits. If |\|/) is a state vector in tt n , the 
operation of an «-qubit quantum logic gate can be represented by |\|/) i— ► U |\|/) for some unitary 2" x 2" 
matrix U. To define unitarity, we first introduce the adjoint of a matrix. 

Notation. Let M be an n x m matrix. By M , we will mean the m x n matrix whose (i,j)-th entry is the 
complex conjugate of the (y',/)-th entry of M. In other words, M^ is the conjugate transpose of M. 

A square matrix M is unitary iff M^M = If for I( an i x t identity matrix. This is the matrix equation 
for a symmetry: M is unitary iff the vector images of M have the same complex inner products as the 
original vectors. Thus, (a) identity matrices are unitary, (b) a product of unitary matrices is unitary, and 
(c) the inverse of a unitary matrix, given by the adjoint, is also unitary. These may be restated in terms of 
quantum logic. The quantum logic operation of "doing nothing" is modeled by the identity matrix, serial 
composition of gates is modeled by the matrix products, and every quantum gate is reversible. 

We shall often define quantum gates by simply specifying their matrices. For example, the following 
matrix specifies a quantum analogue of the classical inverter: it maps |0) >— ► |1) and |1) i— > |0). 

• The inverter a x =1 

Many quantum gates are specified by time-dependent matrices that represent the evolution of a quantum 
system (e.g., an RF pulse affecting a nucleus) that has been "turned on" for time 0. For example, the 
following families of gates are the one-qubit gates most commonly available in physical implementations 
of quantum circuits. 
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™ . „ /rtX / cos8/2 zsin0/2 

The x-axis rotation R x 6 = . . ',„ ',„ 

v ' \ zsin0/2 cosG/2 

T , . .. D / Q x ( cosG/2 sin0/2 

• The y-axis rotation k v (0) = . J,- J.- 

\ —sin 0/2 cos 0/2 

/ e -;e/2 

• The z-axis rotation R-(Q) = I ( - e / 2 

An arbitrary one-qubit computation can be implemented as a sequence of at most three R z and R y 
gates. This is due to the ZYZ decomposition: 3 given any 2 x 2 unitary matrix U, there exist angles 0,0c, (3,y 
satisfying the following equation. 

U = e i0 R z (a)Ry^)Rz(j) (7) 



3 This decomposition is well known and finds many proofs in the literature, e.g., (3|- We shall derive another as a corollary in 
SectionBI 



The nomenclature R x , R y , R : is motivated by a picture of one-qubit states as points on the surface of a 
sphere of unit radius in K 3 . This picture is called the Bloch sphere 1221 . and may be obtained by expanding 
an arbitrary two-dimensional complex vector as below. 



|\|/> = <xo|0) + ai|l> 



= r e" 2 



e -*/ 2 cos?|0) + e*/ 2 sinJ|l) 



(8) 



The constant factor re"' 2 is physically undetectable. Ignoring it, we are left with two angular parameters 
8 and cp, which we interpret as spherical coordinates (l,0,<p). In this picture, |0) and |1) correspond to the 
north and south poles, (1,0,0) and (1,ji,0), respectively. The R X (Q) gate (resp. R y (Q), R-(Q)) corresponds 
to a counterclockwise rotation by 9 around the x (resp. y, z) axis. Finally, just as the point given by the 
spherical coordinates (1,8,9) can be moved to the north pole by first rotating — cp degrees around the z-axis, 
then —8 degrees around the y axis, so too the following matrix equations hold. 



/? v (-e)fl-(-(p)|y)=re'''/2|0) 
R y (Q - 7t)^(7C - cp) |\|/) = re i{, -^l 2 



1) 



(9) 



2.3 Quantum Circuits 

A combinational quantum logic circuit consists of quantum gates, interconnected by quantum wires - 
carrying qubits - without fanout or feedback. As each quantum gate has the same number of inputs and 
outputs, any cut through the circuit crosses the same number of wires. Fixing an ordering on these, a 
quantum circuit can be understood as representing the sequence of quantum logic operations on a quantum 
register. An example is depicted in Figure^ and many more will appear throughout the paper. 

Figure ^contains 12 one- and two-qubit gates applied to a three-qubit register. Observe that the state 
of a three-qubit register is described by a vector in Hj, (an 8-element column), whereas one- and two-qubit 
gates are described by unitary operations on Hi and H\ (given by 4 x 4 and 2x2 matrices, respectively). 
In order to reconcile the dimensions of various state-vectors and matrices, we introduce the tensor product 
operation. 

Consider an £ + m = n-qubit register, on which an £-qubit gate V acts on the top I qubits, with an 
ra-qubit gate W acting on the remainder. We expand the state |\|/) e H n of the «-qubit register, as follows. 



|V) = E a b\ b ) = L a bb' \ b ) \ b ') 



(10) 



Then, denoting by V <8> W the operation performed on the register as a whole, 

V®W\y)= £ a b . v (v\b))(w\b')) 



(11) 



Here, V \b ) £ He and W\b') € H m are to be concatenated, or tensored, as per Equation[6] It can be deduced 
from Equation^2 mat me 2" X 2" matrix of V <S> W is given by 



(V®W) r 



V v W>,c 



for 



r,c£. 



f,d G. 



(12) 



2.4 Circuit Equivalences 

Rather than begin the statement of every theorem with "let U\ , Ui, ... be unitary operators...," we are going 
to use diagrams of quantum logic circuits and circuit equivalences. An equivalence of circuits in which all 
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Figure 1: A typical quantum logic circuit. Information flows from left to right, and the higher wires 
represent higher order qubits. The quantum operation performed by this circuit is (U-/ (g> t/g (g) Uc,){h ® 
V3XV2 ® h) {U4 01/5® Us) (h <8> Vi ) {U\ <g) C/2 <8> I/3), and the last factor is outlined above. Note that when 
the matrix A • fiis applied to vector v, this is equivalent to applying the matrix B first, followed by the 
matrix A. Therefore, the formulas describing quantum circuits must be read right to left. 

gates are fully specified can be checked by multiplying matrices. However, in addition to fully specified 
gates, our circuit diagrams will contain the following generic, or under-specified gates: 

Notation. An equivalence of circuits containing generic gates will mean that for any specification (i.e., 
parameter values) of the gates on one side, there exists a specification of the gates on the other such that 
the circuits compute the same operator. Generic gates used in this paper are limited to the following: 



R 7 



A generic unitary gate. 



An R z gate without a specified angular parameter; conventions for R x , R y are similar. 



— Ta] — A generic diagonal gate. 



A generic scalar multiplication (uncontrolled gate implemented by "doing nothing.") 
We may restate Equation^as an equivalence of generic circuits. 

Theorem 1 : The ZYZ decomposition 01 • 



R z 




R, 




Rz 



Similarly, we also allow underspecified states. 

Notation. We shall interpret a circuit with underspecified states and generic gates as an assertion that for 
any specification of the underspecified input and output states, some specification of the generic gates cir- 
cuit that performs as advertised. We shall denote a completely unspecified state as | ), and an unspecified 
bitstring state as |*). 

For example, we may restate Equation|9]in this manner. 
Theorem 2 : Preparation of one-qubit states. 



I > 



R- 



Rr 



We shall use a backslash to denote that a given wire may carry an arbitrary number of qubits (quantum 
bus). In the sequel, we seek backslashed analogues of Theorems^andlS] 



3 Quantum Conditionals and the Quantum Multiplexor 

Classical conditionals can be described by the if-then-else construction: if the predicate is true, 
perform the action specified in the then clause, if it is false, perform the action specified in the else 
clause. At the gate level, such an operation might be performed by first processing the two clauses in 
parallel, then multiplexing the output. To form the quantum analogue, we replace the predicate by a qubit, 
replace true and false by |1) and |0), and demand that the actions corresponding to clauses be unitary. 
The resulting "quantum conditional" operator U will then be unitary. In particular, when selecting based 
on a coherent superposition do |0) + (Xi 1 1), it will generate a linear combination of the then and else 
outcomes. Below, we shall use the term quantum multiplexor to refer to the circuit block implementing a 
quantum conditional. 

Notation. We shall say that a gate U is a quantum multiplexor with select qubits S if it preserves any 
bitstring state \b) carried by S. In this case, we denote U in quantum logic circuit diagrams by "a" on each 
select qubit, connected by a vertical line to a gate on the remaining data (read-write) qubits. 

In the event that a multiplexor has a single select bit, and the select bit is most significant, the matrix of 
the quantum multiplexor is block diagonal. 

°-("° u.) (13) 

The multiplexor will apply Uq or U\ to the data qubits according as the select qubit carries |0) or |1). To 
express such a block diagonal decomposition, we shall use the notation U = Uq © U\ that is standard in 
linear algebra. More generally, let V be a multiplexor with s select qubits and a <f -qubit wide data bus. If 
the select bits are most significant, the matrix of V will be block diagonal, with 2 s blocks of size 2 d x 2 d . 
The y'-th block Vj is the operator applied to the data bits when the select bits carry \j). 

In general, a gate depicted as a quantum multiplexor need not read or modify as many qubits as in- 
dicated on a diagram. For example, a multiplexor which performs the same operation on the data bits 
regardless of what the select bits carry can be implemented as an operation on the data bits alone. We give 
a less trivial example below: a multiplexor which applies a different scalar multiplication for each value of 
the select bits can be implemented as a diagonal operator applied to the select bits. 



Theorem 3 : Recognizing diagonals. 



v^ 



Indeed, both circuits represent diagonal matrices in which each diagonal entry is repeated (at least) 
twice. In the former case, the repetition is due to a multiplexed scalar acting on the least significant qubit, 
and in the latter there is no attempt to modify the least significant qubit. 

We now clarify the meaning of multiplexed generic gates in circuit diagrams, like that in the above 
circuit equivalence. 

Notation. Let G be a generic gate. A specification U of a multiplexed-G gate can be any quantum 
multiplexor which effects a potentially different specification of G on the data qubits for each bitstring 
appearing on the select qubits. Of course, select qubits may carry a superposition of several bitstring 
states, in which case the behavior of the multiplexed gate is defined by linearity. 



3.1 Quantum Multiplexors on Two Qubits 

Perhaps the simplest quantum multiplexor is the Controlled-NOT (CNOT) gate. 



CNOT 
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(14) 



On bitstring states, the CNOT flips the second (data) bit if the first (select) bit is |1), hence the name 
Controlled-NOT. The CNOT is so common in quantum circuits that it has its own notation: a "•" on the 
select qubit connected by a vertical line to an "0" on the data qubit. This notation is motivated by the 
characterization of the CNOT by the formula \b\) \b2) i— > \b\) \b\ XOR b-i). Several CNOTs are depicted 
in Figure[3] 

The CNOT, together with the one-qubit gates defined in Sf2J forms a universal gate library for quantum 
circuits. 4 In particular, we can use it as a building block to help construct more complicated multiplexors. 
For example, we can implement the multiplexor R z (Qq) ®R z (Q\) by the following circuit. 
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In fact, the exact same statement holds if we replace R z by R y (this can be verified by multiplying four 
matrices). We summarize the result with a circuit equivalence. 

Theorem 4 : Demultiplexing a singly-multiplexed R y or R z . 



R k 



Rk -0- Rk -9 



A similar decomposition exists for any U © V where U,V are one-qubit gates. The idea is to first 
unconditionally apply V on the less significant qubit, and then apply A = UV ' , conditioned on the more 
significant qubit. Decompositions for such controlled- A operators are well known 012). Indeed, if we 
write A = e lt R z ((X)Ry($)R z (y) by Theorem^ then U © V is implemented by the following circuit. 



V 
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Since V is a generic unitary, it can absorb adjacent one-qubit boxes, simplifying the circuit. We re- 
express the result as a circuit equivalence. 

Theorem 5 : Decompositions of a two-qubit multiplexor 



-9- 
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Proof. The first equivalence is just a re-statement of what we have already seen; the second follows from 
it by applying a CNOT on the right to both sides and extracting a diagonal operator. ■ 



*This was first shown in 1121 . The results in the present work also constitute a complete proof. 



3.2 The Multiplexor Extension Property 

The theory of n-qubit quantum multiplexors begins with the observation that whole circuits and even circuit 
equivalences can be multiplexed. This observation has non-quantum origins and can be exemplified by 
comparing two expressions involving conditionals in terms of a classical bit s. 

• if (s) Aq-Bq else A\-B\ 

• A S B S . Here A s means if (s) Aq else A \, with the syntax and semantics of (s?Aq:A\) in 
the C programming language. 

Indeed, one can either make a whole expression conditional on s or make each term conditional on s — 
the two behaviors will be identical. Similarly, one can multiplex a whole equation (with two different 
instantiations of every term) or multiplex each of its terms. The same applies to quantum multiplexing by 
linearity. 

Multiplexor Extension Property (MEP). Let C = D be an equivalence of quantum circuits. Let C' be 
obtained from C by adding a wire which acts as a multiplexor control for every generic gate in C, and let 
D' be obtained from D similarly. Then C 1 =D'. 

Consider the special case of quantum multiplexors with a single data bit, but arbitrarily many select 
bits. We seek to implement such multiplexors via CNOTs and one-qubit gates, beginning with the following 
decomposition. 



Theorem 6 : ZYZ decomposition for single-data-bit multiplexors. 

V-9 9 9 9 — V-p 
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Proof. Apply the MEP to Theorem^ and Theorem[3]to the result. ■ 

The diagonal gate appearing on the right can be recursively decomposed. 

Theorem 7 : Decomposition of diagonal operators JS). 
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Proof. The first equivalence asserts that any diagonal gate can be expressed as a multiplexor of diagonal 
gates. This is true because diagonal gates possess the block-diagonal structure characteristic of multiplex- 
ors, with each block being diagonal. The second equivalence amounts to the MEP applied to the obvious 
fact that a one-qubit gate given by a diagonal matrix is a scalar multiple of an R- gate. The third follows 
from TheoremfJ] ■ 

It remains to decompose the other gates appearing on the right in the circuit diagram of Theorem |6] 
We shall call these gates multiplexed R z (or R y ) gates, 5 as, e.g., the rightmost would apply a different R- 
gate to the data qubit for each classical configuration of the select bits. While efficient implementations are 
known 151 1211 . the usual derivations involve large matrices and Gray codes. 

5 Other authors have used the term uniformly-controlled rotations to describe these gates 1211 . 
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Figure 2: The recursive decomposition of a multiplexed R z gate. The boxed CNOT gates may be canceled. 



Theorem 8 : Demultiplexing multiplexed Rk gates, k = y,z 151 Ell. 



^ a ^ : 



Rk 



R k -0- Rk -0 



£ 



Proof. Apply the MEP to Theorem^] ■ 

It is worth noting that since every gate appearing in Theorem [8] is symmetric, the order of gates in 
this decomposition may be reversed. Recursive application of Theorem[8]can decompose any multiplexed 
rotation into basic gates. In the process, some CNOT gates cancel, as is illustrated in Figure[2] The final 
CNOT count is 2 k , for k select bits. 

4 The Preparation of Quantum States 

We present an asymptotically-optimal technique for the initialization of a quantum register. The problem 
has been known for some time in quantum computing, and it was considered in 1111 l20l 1261 after the 
original formulation HI 01 of the quantum circuit model. It is also a computational primitive in designing 
larger quantum circuits. 

Theorem 9 : Disentangling a qubit. An arbitrary (n + \)-qubit state can be converted into a separable 
(i.e., unentangled) state by a circuit shown below. The resulting state is a tensor product involving a desired 
basis state (\0) or \l)) on the less significant qubit. 



^^—3- 



R z — R y — |*) 



Proof. We show how to produce |0) on the least significant bit; the case of |1) is similar. Let |\|/) be 
an arbitrary (n + 1) -qubit state. Divide the 2" +1 -element vector |\|/) into 2" contiguous 2-element blocks. 
Each is to be interpreted as a two-dimensional complex vector, and the c-th is to be labeled |\|/ 6 ). We now 
determine r c ,t c ,q> c ,Q c as in Equation|9] 



R z (-<p c )R y (-Q c )\y c ) = r c e u °\0) 



(15) 



Let |\|/) be the n-qubit state given by the 2"-elementrow vector with c-th entry r c e ltc , and let U be the block 
diagonal sum ® c R y (— 8 C )/?-(— <p c ). Then U |(|>) = |(|>'} |0), and U may be implemented by a multiplexed/?^ 
gate followed by a multiplexed R y . ■ 

We may apply Theorem[8]to implement the (n + 1 ) -bit circuit given above with 2" +1 CNOT gates. A 
slight optimization is possible given that the gates on the right-hand size in Theorem|8]can be optionally 
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reversed, as explained above. Indeed, if we reverse the decomposition of the multiplexed R y gate, its first 
gate (CNOT) will cancel with the last gate (CNOT) from the decomposed multiplexed R z gates. Thus, only 
2" +1 - 2 CNOT gates are needed. 

Applying Theorem[9]re«<r.s/ve/.y can reduce a given n-qubit quantum state |\|/) to a scalar multiple of a 
desired bitstring state \b); the resulting circuit C uses 2" +1 — In CNOT gates. To go from \b) to |\|/), apply 
the gates of C in reverse order and inverted. We shall call this the inverse circuit, C . 

The state preparation technique can be used to decompose an arbitrary unitary operator U. The idea is 
to construct a circuit for U ' by iteratively applying state preparation. Indeed, an operator is entirely deter- 
mined by its behavior on basis vectors. To this end, each iteration needs to implement the correct behavior 
on a new basis vector while preserving the behavior on previously processed basis vectors. This idea has 
been tried before 1201 13 II . but with methods less efficient than Theorem|9] We outline the procedure below. 

• At step 0, apply Theorem [9] to find a circuit Co that maps U 0) to a scalar multiple of |0). Let 
C/l = C U. 

• At step j, apply Theorem[9]to find a circuit Cj that maps U \j) to a scalar multiple of \j). Importantly, 
the construction of Cj and the previous steps of the algorithm ensure Cj \i) = |/) for all i < j. Define 
Uj+i=CjUj. 

• U2"~i will be diagonal, and may be implemented by a circuit D via Theorem0 

• Finally, U = Co f Ci + . . . C 2 <>-2 + £> 

Thus 2" — 1 state preparation steps and 1 diagonal operator are used. The final CNOT count is 2 x 4" — 
(2n + 3) x 2" + 2n. For n > 2, we improve upon the best previously published technique to decompose 
unitary operators column by column BD . as can be seen in Tabled 

5 A Functional Decomposition for Quantum Logic 

Below we introduce a decomposition for quantum logic that is analogous to the well-known Shannon 
decomposition of Boolean functions (/ = Xif Xj =i +Xif Xi =o)- It expresses an arbitrary n-qubit quantum 
operator in terms of (n — 1 )-qubit operators (cofactors) by means of quantum multiplexors. Applying this 
decomposition recursively yields a synthesis algorithm, for which we compute gate counts. 

5.1 The Cosine-Sine Decomposition 

We recall the Cosine-Sine Decomposition from matrix algebra. 6 It has been used explicitly and regularly 
to build quantum circuits 1301 12 II and has also been employed inadvertently 1321 IS! . 

The CSD states that an even-dimensional unitary matrix U G C can be decomposed into smaller 
unitaries Ai,A2,Bi,B2 and real diagonal matrices C,S such that C 2 +S 2 =lij%. 

For 2x2 matrices U, we may extract scalars out of the left and right factors to recover Theorem^ For 
larger U, the left and right factors Aj © Bj are quantum multiplexors controlled by the most significant 
qubit which determines whether Aj or Bj is to be applied to the lower order qubits. The central factor has 
the same structure as the R y gate. A closer inspection reveals that it applies a different R y gate to the most 

6 Source code for computing the CSD can be obtained from Matlab by typing "which gsvd" at a Matlab command prompt. On 
most laptops this numerical computation scales to ten-qubit quantum operators, i.e., 1024 x 1024 matrices. 
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significant bit for each classical configuration of the low order bits. Thus the CSD can be restated as the 
following equivalence of generic circuits. 

Theorem 10 : The Cosine-Sine Decomposition II15II24I . 



^-J- 



Ry 



a — — 



It has been observed that this theorem may be recursively applied to the side factors on the right-hand 
side 1301 . Indeed, this can be achieved by adding more qubits via the MEP, as shown below. 

Theorem 11 : A multiplexed Cosine-Sine Decomposition PI)! . 



V-p — V-p — -p_ 

.—I—, ~, JL I D 



Vl 




We may now outline the best previously published generic quantum logic synthesis algorithm 1211 . 
Iterated application of Theorem ^2 to the decomposition of Theorem ^| gives a decomposition of an 
arbitrary unitary operator into single-data-bit QMUX gates, some of which are already multiplexed R y 
gates. Those which are not can be decomposed into multiplexed rotations by Theorem^ and then all the 
multiplexed rotations can be decomposed into elementary gates by Theorem[8] 

One weakness of this algorithm is that it cannot readily take advantage of hand-optimized generic 
circuits on low numbers of qubits [34 .33' 2711251 . This is because it does not recurse on generic operators, 
but rather on multiplexors. 



5.2 Demultiplexing Multiplexors, and the Quantum Shannon Decomposition 

We now give a novel, simpler decomposition of single-select-bit multiplexors whose two cofactors are 
generic operators. As will be shown later, it leads to a more natural recursion, with known optimizations 
in end-cases I34ll33ll27ll25l. 



Theorem 12 : Demultiplexing a multiplexor. 





Proof. Let U = Uq(B U\ be the multiplexor of choice; we formulate and solve an equation for the unitaries 
required to implement U in the manner indicated above. We want unitary V, W and unitary diagonal D 
satisfying U = (7®V)(.D©.D t )(/<g>W). In other words, 



Ui 



u 2 



V 



V 



D 



ir 



w 



w 



(16) 



Multiplying the expressions for U\ and f/2, we cancel out the W-related terms and obtain UiU^ = VD 2 V^ . 
Using this equation, one can recover D and V from UiU-i 1 by a standard computational primitive called 
diagonalization. Further, W = DV^E/2. It remains only to remark that for D diagonal, the matrix D © D* is 
in fact a multiplexed R z gate acting on the most significant bit in the circuit. ■ 
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Synthesis Algorithm 


Number of qubits and £ 
1 | 2 | 3 | 4 5 6 7 


;ate counts 
n 


Original QR decomp. |3 9] 
Improved QR decomp. 1201 
Palindrome transform [2] 








0{nH n ) 

0{n4") 
0(n4 n ) 








QR|31 Table I] 
QR (Theorem|9} 






4 
8 


64 
62 


536 

344 


4156 
1642 


22618 
7244 


108760 
30606 


0(4") 

2x4" - (2n + 3) x2 n + 2n 


CSD [21 p. 4] 





8 


48 


224 


960 


3968 


16128 


4" - 2 x 2" 


QSD (/ = 1) 
QSD (/ = 2) 
QSD (/ = 2, optimized) 







6 
3 
3 


36 
24 
20 


168 
120 
100 


720 
528 
444 


2976 
2208 
1868 


12096 
9024 
7660 


(3/4) x 4"- (3/2) x 2" 
(9/16) x 4"- (3/2) x 2" 
(23/48) x 4" - (3/2) x 2" + 4/3 


Lower bounds 1271 





3 


14 


61 


252 


1020 


4091 


[|(4»-3n-l)l 



Table 1 : A comparison of CNOT counts for unitary circuits generated by several algorithms (best results 
are in bold). We have labeled the algorithms by the matrix decomposition they implement. The results 
of this paper are boldfaced, including an optimized QR decomposition and three algorithms based on the 
Quantum Shannon Decomposition (QSD). Other rows represent previously published algorithms. Gate 
counts are not given for algorithms whose performance is not (generically) asymptotically optimal. 

Using the new decomposition, we now demultiplex the two side multiplexors in the Cosine-Sine De- 
composition (Theorem I lOt . This leads to the following decomposition of generic operators that can be 
applied recursively. 

Theorem 13 : The Quantum Shannon Decomposition. 




Hence an arbitrary «-qubit operator can be implemented by a circuit containing three multiplexed rota- 
tions and four generic (n — 1 )-qubit operators, which can be viewed as cofactors of the original operator. 

5.3 Recursive Gate Counts for Universal Circuits 

We present gate counts for the circuit synthesis algorithm implicit in Theorem 13. An important issue 
which remains is to choose the level at which to cease the recursion and handle end-cases with special 
purpose techniques. 

Thus, let Cj be the least number of CNOT gates needed to implement a y-qubit unitary operator using 
some known quantum circuit synthesis algorithm. Then Theorem^]implies the following. 



cj < 4c/-i+3 x2 j ~ 



(17) 



One can now apply the decomposition of Theoreml 1 3lrecursivelv. which corresponds to iterating the above 
inequality. If ^-qubit operators may be implemented using < q CNOT gates, one can prove the following 
inequality for c„ by induction. 



e-u 



< 4"- l (c ( + 3x2 t - l )-3x2 n 



'i-i 



(18) 
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Figure 3: Implementing a long-range CNOT gate with nearest-neighbor CNOTs. 

We have recorded in Table [2 the formula for c n with recursion bottoms out at one-qubit operators 
(/ = 1 and C( = 0), or two-qubit operators (1=2 and c/ = 3 by 11271 1341 13*31 ). In either case, we improve 
on the best previously published algorithm (cf. 1211 ). However, to obtain our advertised CNOT-count of 
(23/48) x 4" — (3/2) x 2" +4/3 we shall need two further optimizations. Due to their more technical 
nature, they are discussed in the Appendix. 

Note that for n = 3, only 20 CNOTs are needed. This is the best known three-qubit circuit at present 
(cf. B2I ). Thus, our algorithm is the first efficient n-qubit circuit synthesis routine which also produces a 
best-practice circuit in a small number of qubits. 

6 Nearest-Neighbor Circuits 

A frequent criticism of quantum logic synthesis (especially highly optimized circuits which nonetheless 
must conform to large theoretical lower bounds on the number of gates) is that the resulting circuits are 
physically impractical. In particular, naive gate counts ignore many important physical problems which 
arise in practice. Many such are grouped under the topic of quantum architectures 1231 . including 
questions of (1) how best to arrange the qubits and (2) how to adapt a circuit diagram to a particular 
physical layout. A spin chain 1 is perhaps the most restrictive architecture: the qubits are laid out in a line, 
and all CNOT gates must act only on adjacent (nearest-neighbor) qubits. As spin-chains embed into two 
and three dimensional grids, we view them as the most difficult architecture from the perspective of layout. 
The work in |14j shows how to adapt Shor's algorithm to spin-chains without asymptotic increase in gate 
counts. However, it is not yet clear if generic circuits can be adapted similarly. 

As shown next, our circuits adapt well to the spin-chain limitations. Most CNOT gates used in our 
decomposition already act on nearest neighbors, e.g., those gates implementing the two-qubit operators. 
Moreover, Fig. [2]shows that only 2 n ~ k CNOT gates of length k (where the length of a local CNOT is 1 ) will 
appear in the circuit implementing a multiplexed rotation with (n— 1) control bits. Figure|3]decomposes a 
length k CNOT into 4k - 4 length 1 CNOTs. Summation shows that 9 x 2"~ : - 8 nearest-neighbor CNOTs 
suffice to implement the multiplexed rotation. Therefore restricting CNOT gates to nearest-neighbor inter- 
actions increases CNOT count by at most a factor of nine. 

7 Conclusions and Future Work 

Our approach to quantum circuit synthesis emphasizes simplicity, a well-pronounced top-down structure, 
and practical computation via the Cosine-Sine Decomposition. By introducing the quantum multiplexor 
and optimizing its singly-controlled version, we derived a quantum analogue of the well-known Shannon 
decomposition of Boolean functions. Applying this decomposition recursively to quantum operators leads 
to a circuit synthesis algorithm in terms of quantum multiplexors. As seen in Table ["f] our techniques 

7 The term arises since the qubit is also commonly thought of as an abstract particle with quantum spin 1/2. 
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achieve the best known controlled-not counts, both for small numbers of qubits and asymptotically. Our 
approach has the additional advantage that it co-opts all results on small numbers of qubits - e.g., future 
specialty techniques developed for three-qubit quantum logic synthesis can be used as terminal cases of 
our recursion. We have also discussed various problems specific to quantum computation, specifically 
initialization of quantum registers and mapping to the nearest-neighbor gate library. 
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Appendix A: Additional Circuit Optimizations 

Section|5]shows that recursively applying the Quantum Shannon Decomposition until only ^-qubit opera- 
tors remain produces circuits with at most 4" - *' (q + 3 x 2^ ) — 3 x 2" CNOT gates. 

To obtain our advertised CNOT-count, we apply additional optimizations below that reduce (4"~ ( — 
l)/3 CNOTs in general, and an additional 4"~ 2 — 1 in the case £ = 2,q = 3. This results in the following, 
final CNOT count. 

c„ < (23/48) x 4" - (3/2) x 2" + 4/3 (19) 

Observe that the leading term is slightly below 4"/2, whereas the leading term in the lower bound from 
1271 is 4"/4. Thus, our result cannot be improved by more than a factor of two. 



A.l Implementing Multiplexed-^ with Controlled-Z 

Recall the two-qubit controlled-Z gate, given by the following matrix 

( X \ 

Controlled-Z = 



1 



V 



1 



(20) 



The controlled-Z gate is commonly denoted by a "•" on each qubit, connected by a vertical line, as shown 
in the diagram below. This gate can be implemented using a single CNOT with the desired orientation, and 
one-qubit gates (whose physical realizations are typically simpler). 



R y (n/2) 



-0- 



Ry(-K/2) 



I 



- R y (n/2) 



-9- 



Ry(-K/2) - 



The statements and proofs of Theorem[8]and Figure |2] still hold for multiplexed R y gates if all CNOTs are 
replaced with controlled-Z gates. Thus the central multiplexed R y in the Cosine-Sine decomposition may 
be implemented with 2" controlled-Z gates, of which one is initial (or terminal). As the initial controlled-Z 
gate is diagonal, it may be absorbed into the neighboring generic multiplexor. This saves one gate at each 
step of the recursion, for the total savings of (4"~ l — l)/3 CNOT gates. 



16 



A.2 Extracting Diagonals to Improve Decomposition of Two-Qubit Operators 

Terminate the recursion when only two-qubit operators remain; there will be 4"~ 2 of them. These two-qubit 
operators all act on the least significant qubits and are separated by the controls of multiplexed rotations. 
To perform better optimization, we recite a known result on the decomposition of two-qubit operators. 

Theorem 14 : Decomposition of a two-qubit operator l27l . 





R 




o 


Ry 


o 


M 




M 





We use Theorem^^to decompose the rightmost two-qubit operator; migrate the diagonal through the 
select bits of the multiplexor to the left, and join it with the two-qubit operator on the other side. Now we 
decompose this operator, and continue the process. Since we save one CNOT in the implementation of 
every two-qubit gate but the last, we improve the / = 2, C{ = 3 count by 4"~ 2 — 1 gates. 
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